Experiments with Infinite-Horizon, Policy-Gradient Estimation
نویسندگان
چکیده
منابع مشابه
Experiments with Infinite-Horizon, Policy-Gradient Estimation
In this paper, we present algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP). These algorithms are based on GPOMDP, an algorithm introduced in a companion paper (Baxter & Bartlett, 2001), which computes biased estimates of the performance gradient in POMDPs. The algorithm’s chief advantages are that it uses only one free param...
متن کاملInfinite-Horizon Policy-Gradient Estimation
Gradient-based approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in value-function methods. In this paper we introduce GPOMDP, a simulation-based algorithm for generating a biased estimate of the gradient of the average reward ...
متن کاملConvergence of trajectories in infinite horizon optimization
In this paper, we investigate the convergence of a sequence of minimizing trajectories in infinite horizon optimization problems. The convergence is considered in the sense of ideals and their particular case called the statistical convergence. The optimality is defined as a total cost over the infinite horizon.
متن کاملStabilizing Policy Improvement for Large-Scale Infinite-Horizon Dynamic Programming
Today’s focus on sustainability within industry presents a modeling challenge that may be dealt with using dynamic programming over an infinite time horizon. However, the curse of dimensionality often results in a large number of states in these models. These large-scale models require numerically stable solution methods. The best method for infinite-horizon dynamic programming depends on both ...
متن کاملHeuristic Policy Iteration for Infinite-Horizon Decentralized POMDPs
Decentralized POMDPs (DEC-POMDPs) offer a rich model for planning under uncertainty in multiagent settings. Improving the scalability of solution techniques is an important challenge. While an optimal algorithm has been developed for infinitehorizon DEC-POMDPs, it often requires an intractable amount of time and memory. To address this problem, we present a heuristic version of this algorithm. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Artificial Intelligence Research
سال: 2001
ISSN: 1076-9757
DOI: 10.1613/jair.807